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ABSTRACT 


When  applying  the  classical  Chi-square  test  of  goodness 
of  fit,  it  is  always  assumed  that  the  test  statistic  is  X  - 
distributed.  Since  this  is  true  only  for  very  large  samples, 
some  restrictions  on  the  class  frequencies  have  to  be  intro¬ 
duced.  It  is  generally  accepted  that  none  of  the  expected 
frequencies  should  be  less  than  ten,  which  makes  this  test 
useless  for  small  and  moderate  samples. 

In  order  to  eliminate  these  -  from  a  practical  viewpoint 
severe  —  restrictions,  it  is  proposed  to  use  the  exact 
sampling  distribution  instead  of  the  limiting  %a-distribution. 
When  doing  so,  the  test  will  be  called  the  Eks-square  test. 

Programs  have  been  written  for  computing  these  distri¬ 
butions  and  the  improvements  attained  have  been  stated. 

The  possibilities  of  using  the  modified  test  statistic 
as  a  location,  soale,  and  shape  operator  have  been  examined 
and  illustrated  by  numerical  examples.  Several  tables  have 
been  prepared. 
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INTRODUCTION 


The  most  generally  used  test  of  goodness  of  fit  for  the  last 
seventy  years,  called  the  Chi-square  test,  was  introduced  by 
K.  Pearson  [lj.  Its  test  statistic  Jr  is  defined  hy 

x2  -  z[(t±-  N.Pi)2^r.Pi]  (1) 


where 

r  -a  finite  number  of  Parts  (classes)  without  common  Points 
into  which  the  sPaoe  of  the  variable  has  been  divided, 

P.  ■  the  corresPonding  values  of  the  given  Probability  function 

(2Pi«l), 

v.  m  the  observed  olass  frequencies  in  the  samPle  of  size  H 
1  (Zv1-N). 

(2) 

(3) 


Introducing  the  exj>ected  class  frequencies 


T.i  ■  *-pi 


we  have 


*2  ’  -  J[T*Art)-: 


In  Partioular,  if  all  the  olass  frequencies  are  equal  to 
we  have 


X 


2 


-H 


where 


r-N/vo 


(4) 


(5) 


1 


(6) 


Introducing  (5)  into  (4)  results  in 

p 

x2  =  rzv,2/n-n 
i/s  1  1 

2 

A  remarkable  property  of  the  statistic  X  is  due  to 
the  fact  that  the  standardized  variable 

w-  (T-I.P)/  .  (V-  T0)/ 

is  asymptotically  normal  (0,l)  on  the  condition  that  p 
remains  constant  when  N  — ►  00.  Hence,  on  certain  conditions 
the  random  variable 


w-  \>V  V% 

tends  to  the  random  variable  ^  ,  which  is  normal  (0,l). 

Thus, 

X2 - -  £  g*  (9) 

1 

p  ;/  ^ 

that  is,  X  is,  in  the  limit,  distributed  in  a  X  -distribu¬ 
tion  with  r-1  degrees  of  freedom  (d.fr.). 

2  W  2 

The  fact  that  it  is  always  assumed  that  X  is  /C  - 
distributed,  makes  it  necessary  to  introduce  some  restrictions. 

The  following  conditions  are  generally  accepted  (Cf.CramAr  [2], 
p.420):  When  the  £*-test  is  applied  in  practioe,  and  2all  the 

expected  frequencies  N.p.,  are  210,  the  limiting  %  -distri¬ 
bution  can  be  used  with  x  an  approximation  sufficient  for  or¬ 
dinary  purposes.  If  some  of  the  Tf.p^  are  -^10  it  is  ad¬ 
visable  to  pool  the  smaller  classes,  so  that  every  class  con¬ 
tains  at  least  10  expected  observations,  before  the  test  is 
applied.  When  the  observations  are  so  few  that  this  cannot  be 
done,  the  X2 tables  should  not  be  used,  but  some  information  may 
still  be  drawn  from  the  values  of  the  mean  and  the  variance  of 

The  condition  v  =  N.p.  210  restricts,  in  fact,  the  use 
of  this  test  to  quite0  1  large  samples.  Considering  that 

there  must  be  at  least  one  degree  of  freedom  and  that  the  number 
of  degrees  of  freedom  is  reduced  by  one  unit  for  each  parameter 


2 


estimated  from  the  sample,  it  follows  that  for  a  two-parametrio 
hypothetical  distribution  with  unknown  parameters,  say,  for  the 
normal  distribution,  the  sample  size  should  not  he  less  than 
H  «  40,  and  for  a  distribution  with  three  unknown  parameters 
not  less  than  9  ■  50. 

Sven  these,  from  a  practical  view-point  rather  severe 
restrictions,  are  in  some  cases  not  sufficient  as  will  be  demon¬ 
strated  in  the  following  gtudies  of  the  exact  distributions  of 
the  statistics  w  and  X  • 

2 

It  is  evident  that,  if  the  ^xact  distribution  of  X  is 
used  instead  of  the  limiting  ^-distribution,  then  there  will 
be  no  need  of  restrictions  with  regard  to  v  .  In  view  of  the 
fact  that  any  pooling  of  classes  implies  a  reduction  of  the 
amount  of  information  provided  by  the  sample,  as  will  be  proved 
in  the  following,  it  seems,  in  some  oases,  desirable  to  use  as 
many  and  as  small  classes  as  possible.  Thus,  if  all  sample  va¬ 
lues  are  known,  then  v  -  1  will  sometimes  be  the  preferable 
value,  and,  if  the  sampSe  is  presented  as  a  table  with  given 
class  limits  and  corresponding  observed  frequencies  v^, 
then  no  pooling  of  the  classes  would  be  undertaken*  1 

In  order  to  distinguish  ^e tween  the  two  alternatives  of 
using  either  the  limiting  X  -distribution  or  the  exact  dis¬ 
tribution  of  the  test  statistic  X  ,  these  two  types  of  tests 
will  be  called  the  C^i-square  and  the  Eks-square  test  of  good¬ 
ness  of  fit.  The  7T  -  test  will,  in  general,  operate  with  much 

smaller  values  of  v  ,  even  v  ■  1,  than  will  the  j^2-test. 

o  o 

Since  thfl)  use  of  the  X  -distribution  as  an  approximation 
to  that  of  X  is  based  on  the  assumption  that  the  random 
variable  w-(v-v  )/  Vv”  is  normally  (0,1)  distributed, 
it  may  be  of  interest  to  °state  its  deviations  from  normality. 

To  this  purpose,  some  exaot  distributions  of  w  have  been  de¬ 
duced  and  examined  as  indicated  in  the  following  section. 

II 

THE  EXACT  DISTRIBUTION  OF  W  =  (V  -  VQ)//v^ 

Let  the  probability  that  one  single  observation  falls 
within  the  i»th  of  r  olasses  be  p  (i- 1,2, . . ,,rj  Zp.  -  l), 
then  the  probability  that  9  Independent  observation^  are 
distributed  in  such  a  way  that  there  are  v.  observations 
within  the  i:th  class,  becomes 


3 


(10) 


p 


N! 


~r~ 

2** 


For  the  special  case  that  we  have  only  two  classes  with 
the  probabilities  p  and  1-p,  then  the  probability  that 
v  observations  fall  within  the  first  class  and  N— v  within 
the  second  class  is 


P(v) 


N! 

v’.(N-v)* 


p) 


N-v 


Introducing 

p  -  vo/K 


(11) 


(12) 


we  have 


J>(v,vo) 


N! 

N* 


N-  v 

<*  -  V 


For  large  values  of  N  the  right-hand  member  of  equ.(l3) 
is  a  quotient  of  very  large  integers,  which  makes  it  difficult 
to  compute  the  value  of  p(v,vQ)  with  sufficient  precision. 

For  this  reason  the  following  three  recurrence  formulas 
have  been  deduced. 


a)  v  ■  constant 

— i —  0 - 

p(v+l,vo)-p(v,vo).  •  5”V— 

o 

b)  v  «  constant 

(v  +  l)v  ,  N-v 

p(t,t  + 1) .  p(t,t  ) — 2_ - (i-j— r> 


V  -  V 


o  — 


p(vQ  +  1,vq  +  1)  -  p(v0>v0)  * 


(v0+l)V° 


.  N-v 

0 

o 


(13) 


(14) 


(15) 


(16) 
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From  (14)  it  is  easily  found  that  p(vq,vo)  is  the  ^P" 
remum  of  each  v  — column  and  from  (15)  that  it  is  the  sup— 
remum  also  of  ea8h  v-row.  Further,  it  follows  from  (16)  that 
p(vo,vQ)  is  monotonously  decreasing  with  v^. 

Based  on  the  preoeding  formulas,  Program  5/71  has  Been 
written.  Several  tables  have  been  prepared  for  sample  sizes 
up  to  N-  10,000.  Some  of  the  results  are  presented  in  Table 
1  and  Table  2.  Here  the  function  F(x)  is  equal  to  the  pro¬ 
bability  that  fix.  The  following  two  important  fornulas 


are  verified  by  the  tables. 

The  expected  value  of  w 

E(w)  -  0 

(17) 

2 

The  expected  value  of  w 

E(w2)=  (N-  v  )/N«(r-  l)/r 

(18) 

In  Table  1  the  exact  distribution  F(w)  and  the  normal 
distribution  cT(w)  are  listed  for  v  -1  and  several  sample 
sizes  N  up  to  H«  10,000.  There  is°  a  principal  difference 
between  these  two  distributions  in  so  far  as  the  former  is  a 
disorete  and  the  latter  a  continuous  function.  The  sample  size 
N  has  very  little  effect  on  F(w),  for  N  >  50  practically 
none  at  all. 

In  Table  2  the  functions  F(w)  and  /(w)  are  listed  for 
H-  1,000  and  v  «  1,4,  and  16.  Also  for  the  last  value  of 
v  there  is  a  0  definite  difference  between  the  two  distri- 
0  buttons,  as  conspicuously  demonstrated  in  Fig.l. 

2 

This  results  motivate  a  closer  examination  of  the  X  - 
distributions  and  their  deviations  from  normality. 

We  then  have  to  distinguish  between  two  oases  1 

a)  the  hypothetical  distribution  is  completely  specified 

b)  oertain  parameters  of  the  hypothetical  function  are 
estimated  from  the  sample. 
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Ill 


THE  HYPOTHETICAL  DISTRIBUTION  IS  COMPLETELY  SATISFIED 


Let  the  hypothetical  distribution  be  specified  by  the 
cumulative  distribution  function  (Cdf) 

F[(x-  jO/p]  (19) 

with  all  parameters  known. 

If  now  our  sample  is  presented  as  a  table  with  fixed  class 
limits  xq  and  observed  class  frequencies  v. ,  then  the  ex¬ 
pected  class  frequencies  are  1 

Voc  -{p[(xc-  ^/0]-F[xc_i-iO/p]}N  (20) 

and  we  have  merely  to  calculate  the  test  value  X2  by  intro¬ 
ducing  the  two  sets  vi  and  vqc  into  the  formula  (3). 

In  order  to  decide  whether  this  test  value  corresponds  to 
an  acceptable  goodness  of  fit,  the  sampling  distribution  of  X2, 
specified  by  the  set  V0Q*  bas  to  be  known.  This  distribution 
can  be  determinated  by  means  of  a  Monte-Carlo  procedure  ac¬ 
cording  to  Program  21/71.  It  provides  the  probability  of  ob¬ 
taining  an  value  X  which  is  larger  than  the  test  value.  This 
program  will  be  described  in  details  in  the  following.  It 
should  only  be  mentioned  here  that  generating  10,000  random 
samples  from  the  hypothetical  population,  computing  for  each 
sample  a  random  value  X^  and  classifying  them  takes  a  com¬ 
puting  time  less  than  20  scs.  for  a  sample  of  size  N  ■  10  and 
less  than  40  scs.  for  N  -  20. 

If  all  the  elements  x.  of  the  sample  are  known,  then  we 
oan  freely  choose  the  expected  frequencies  v  .  The  most  con¬ 
venient  choioe  will  be  to  let  all  expeoted  oc  class  frequencies 
be  the  same,  which  makes  r  classes,  each  having  the  expected 
frequency  vo-H/r.  The  value  of  XJ^  is  then  given  by  equ.(6). 

This  particular  case,  will  be  more  closely  examined. 


3.1  General  properties  of  the  statistic  X2  ^  _ 

Prom  the  definition  (6)  it  immediately  follows  that 

~2  _  ,2 


inf  X 


'A 


9 


sup  X 


'/N 


*=  N(r-  1) 


(21) 
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Since  the  infimum  of  X2  corresponds  to  v^N/r  and 
Pi  -  1 /r,  we  have  from  (10) 

Proh(X2^j  -  0)  -H!/r*.  (ir/r  if  (22) 

which  for  vo«l,  r-N  becomes 

Prob^^-O)-*!/!^  (23) 

2 

Since  the  supremum  of  X  corresponds  to  the  case  that 
anyone  of  the  r  classes  contains  N  observations  we  have 
from  (lO) 

Prob[X2^N«N(r-l)] -r"^-1^  (24) 

which  for  vq-1,  that  is  r»N,  becomes 

Prob[X^-»(H-l)] (25) 

It  is  a  remarkable  fact  that  the  expected  value 

E(X2^)-r-l-E(4T1)  (26) 

and  the  variance 

Var(X2r^)  -  2(r-l)(H-l)/lf  ■  Var(  (27) 

that  is,  in  spite  of  quite  different  distributions,  the  expec¬ 
ted  values  and  variances  of  X r ^  and  z*  are  identical. 

These  two  formulas  have  not  been  theoretically  deduced,  but 
they  are  exactly  verified  by  use  of  the  theoretically  deduced 
distributions  of  X2  i_  and  closely  approximated  by  use  of  the 
Monte-Carlo  determiSea  distributions,  as  demonstrated  below. 

Equ.(26)  is,  however,  a  corollary  of  equ.(l8). 


hi 


Some  exact 


distributions  of 


Prom  equ. (6)  it  follows  that  for  given  r,N  the  quantity 
is  uniquely  determined  by  the  set  of  observed  frequencies 
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4 


In  the  actual  case,  the  probability  of  obtaining  this 
set  is,  introducing  pi  -  1/r  in  equ.(lO) 

p-N’./rN.  yi»  v2:  ..  vr’  (28) 

2 

Since  the  value  of  X  is  independent  of  any  permu¬ 
tation  of  the  frequencies,'  we  have  to  multiply  the  probabi¬ 
lity  p  by  the  number  of  different  permutations  of  . 

For  small  values  of  r  this  is  a  feasible  task.  In  this 
way,  the  distributions  of  X^/.g,  *2/20  and  X>/o  have 

been  calculated.  The  results  '  are  '  'presented 

in  Table  3. 

For  large  r  it  is  more  convenient  to  determine  the  dis¬ 
tributions  by  use  of  a  Monte-Carlo  studies  in  the  following  way. 

Let  the  hypothetical  distribution  be 

Pi“ri“F[(xi"  fO/P]  (29) 

Putting  P .  *  r  ,  where  r.  is  a  random  variable  uni¬ 
formly  distributed  on  the  interval  (0,1)  is  made  in 

order  to  emphasize  the  following,  most  important  property  of 
any  continuous  distribution  function  F(x).  (Cf .Wilks[4j ,p.l3) j 
If  X  is  a  random  variable  having  a  continuous  cdf  F(x)  then 
F(X)  is  a  random  variable  such  that 

Prob[F(x)  -p]  -  p  (30) 

Hence,  the  random  variables  P.  are  independently  and  uni¬ 
formly  distributed  on  the  1  interval  (0,l),  just  as  are  r^# 

Inverting  equ.(29)  we  have 

+  (i  (3D 

If  now  one  of  the  classes,  into  which  the  space  of  the  vari¬ 
able  x  has  been  divided,  has  the  limits  x  and  x.  and  if 
x^^  falls  within  this  class,  that  is,  if  a  ° 

xa<xi<xb  (32) 


% 
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then 


ra-F[(xa-  }i)/p]<ri-F[(xi-  -  F[(^-  f)/P] 

beoause  the  funotion  F  is  nondecreasing.  From  (32)  and  (33) 
it  can  be  concluded  that  the  probability  of  x±  falling  with¬ 
in  the  interval  (xa,x^)  is  equal  to  that  of  r^  falling 
within  the  interval  (r^r^). 

Thus,  if  the  values  of  the  given  sample  have  been  arranged 
for  tabulation  purposes  into  an  arbitrary  number  of  classes 
with  the  limits  -co,  x.. ,  x?,  then  the  corresponding 

number  0,  r.,rp,  . 1  are  computed  by  use  of  (29),  from 

whioh  it  also  can  be  concluded  that  the  expected  frequency 

v  .  of  the  i *th  class  is 
01 

voi  m*(T±~  ri-l^ 


(33) 


(34) 


From  the  preceding  it  follows  that,  instead  of  generating 
a  random  sample  of  size  If  from  the  population,  defined  by  the 
odf  F(x),  and  oounting  the  number  of  elements  x^  falling 
within  each  of  the  classes  in  the  spaoe  of  x,  identical 

result  is  obtained  by  taking  out  at  random  a  set  of  N  random 
numbers  r  and  counting  the  number  of  them  whioh  are  falling 
within  1  each  of  the  olasses  in  the  spaoe  of  r. 


It  is  evident  that,  if  the  expected  frequency  of  each 
class  is  equal  to  v  -N/r,  then  the  M.-C. -procedure  is  indepen¬ 
dent  of  the  hypothet?cal  distribution.  The  only  way  it  enters 
into  the  problem  consists  in  the  calculation  of  the  class  limits 
by  use  of  equ.(3l)» 


The  M.-C .-procedure  is  performed  by  Program  2l/71.  For  any 
given  sample  size  N ,  it  generates  a  large  number  H part»  say» 
10,000,  of  random  samples  r.,  oomputes  from  each  of 

them  /_  for  a  selected  number  of  v^,  for  instance,  for 
r/H 


H-  10, 


v  ■  1,  2,  5,  wid  for  K«20,w  v^«l,  2,  5,  10 
®  same  random  samples  are  used  for  the  different 


Be¬ 


cause  the  - - - 

values  v  much  computing  time  is  saved,  since  the  frequencies 
corresponding  to  v  *  1  are  obtained  by  pooling  the  B  classes 
for  v  -1.  Also  0  the  means  and  variances  of  are  com¬ 

puted  °  and  written  down. 
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The  computing  times  were 


59.0  sec  for  N  »  10$  r«2,  5,  10$  Npart *  30,000 

48.6  sec  for  N«20$  r«2,  4,  10,  20$  Npart  “  10,000 

The  distributions  for  N  ■  10  are  presented  in  Table  4 
and  a  comparison  between  X? /.n  and  ^afor  one  degree  of 
freedom  in  Pig. 2.  ' 


As  long  as  the  X  -distribution  is  assumed  to  be  distri¬ 
buted  in  a  ^^-distjibution  with  r-1  degrees  of  freedom, 
(d.fr.)  the  value  X  corresponding  to  a  given  level  of 
significance  p  is  p  assumed  to  be  equal  to  the  well  known 
and  tabulated  values  Xp  .  Some  of  these  values  are  listed 
below.  ° 


level  of 
significance 
P 

* 

Values 

of  z* 

1  d.fr. 

2  d.fr. 

9  d.fr. 

19  d.fr. 

5 

3.841 

5.991 

16.919 

30.144 

2 

5.412 

7.824 

19.679 

33.687 

1 

6.635 

9.210 

21.666 

36.191 

0.1 

10.827 

13.815 

27.877 

43.820 

2  2 

The  p  percent  value  of  %  for  r  d.fr.  is  a 

value  such  that  the  probability  that  an  observed  value  of  X  * 
exceeds  %  is 

Prob(  £>2p)- p/100 

The^error  committed  by  using  these  values  instead  of  the 
exact  X  can  b^  stated  by  reading  from  the  exact  step  functions 
Q(x)  =  p  Prob(X  jl_  >  x)  the  probabilities  corresponding  to 
the  assumed  values  of  p.  Some  errors  are  presented  below. 
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Assumed 

level 

* 

Exact  levels  of  significance 

1  of 

t2 

r 

0 

CM  „ 

CsJ  CM 

K 

i1 

3/9 

* 

xio/io 

* 

8 

CM  CM 

X 

5 

2.148 

4.139 

5.045 

3.86 

3.42 

2 

2.148 

1.182 

2.484 

2.27 

2.15 

1 

0.195 

1.182 

0.290 

1.26 

0.96 

0.1 

0.000 

0.040 

0.015 

0.16 

O.25 

For  example,  the  assumed  level  of  significance  5$  is  actu¬ 
ally  2.14896  if  the  test  statistic  *2/10  is  used  and  Xp 
is  taken  -  3.841* 

2 

It  may  also  he  noted  that  *2/20*  aa  ^avin®  v0  * 
satisfies  the  accepted  rule  '  v  i  10. 

Nevertheless,  there  is  a  definite  difference  between  the  as¬ 
sumed  and  the  true  level  of  significance. 

2 

3.4  The  statistic  X  ^  as  a  location  or  a  scale  operator 

If  the  hypothetical  distribution  and  one  of  the  two  para¬ 
meters,  looation  and  scale  (3,  are2specified,  then  the 

other  parameter  can  be  selected  using  as  a  test  operator. 

The  decision  power  of  this  operator  has  been  determined 
by  means  of  Program  22/71. 

The  principle  on  which  this  program  is  based  consists  in 
specifying  completely  two  distributions,  including  the  function, 
the  location  and  the  scale  parameter.  From  the  first  distribu¬ 
tion  the  class  limits  for  an  arbitrary  number  of  equivalent 
olasses  r  are  computed.  Then  a  large  number  of  random  samples, 
say  10,000,  belonging  to  the  population,  specified  by  the  second 
distribution,  are  generated.  For  each  sample  the  value  of 
is  computed  and  the  frequency  distribution  of  these  10,000 
values  are  determined.  If  the  two  distributions  are  identical, 
this  program  produces  the  same  result  as  Program  21/71* 
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The  following  five  alternatives  have  been  run  for  N  = 10$ 
r«2,  5,  10. 


Alt .  1 

Alt.  2 

Alt.  3 

Alt.  4 

Alt.  5 

Distribution 

Function 

F  P 

F  P 

F  P 

F  P 

F  P 

1 

Normal 

0  1 

0  1 

0  1 

0  1 

0  1 

2 

Normal 

1  1 

2  1 

3  1 

\T\ 

• 

o 

o 

0  2 

By  comparing  these  fifteen  frequency  distributions  with 
those  corresponding  to  Distribution  nr  1,  computed  by  use  of 
Program  21/71,  the  following  decision  powers  are  obtained. 


Statistic 

p.  =  0  vs .  1 

0  vs.  2 

C 

>  vs.  3 

P  -  1  vs. 0.5 

1  vs. 

2 

x2 

2/10 

t2 

*5/lo 

X2 

Aio/io 

68.5 

96.1 

99.94 

0 

0 

66.0 

97.8 

99.95 

48.3 

33.5 

58.7 

97.5 

99.98 

37.9 

41.7 

It  is  interesting  to  note  that  the  best  value  of  r  depends 
on  the  difference  between  the  parameters. 

IV 

THE  PARAMETERS  OF  THE  HYPOTHETICAL  DISTRIBUTION  ARE 
ESTIMATED  FROM  THE  SAMPLE 

In  the  preceding  the  parameters  of  the  hypothetical  function 
were  specified,  which  is  a  rather  exceptional  case  in  the  appli¬ 
cations. 

We  will  now  examine  the  case  that  the  hypothetical  distribu¬ 
tion  function  including  its  shape  parameter,  if  any,  is  specified, 
but  that  the  location  and  scale  parameters  have  to  be  estimated 
from  the  sample. 

In  order  to  emphasize  that  the  class  probabilities  p  are 
functions  of  the  parameters  and  p,  the  definition  (l^  will 
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be  written 


X2-L[(vi-N.pi(H,p)2/N.Pi(p,P)] 


(35) 


where 


pi-ro-rc-i-pC(V  pV^-^Vi"  pVp] 

(36) 

the  upper  class  limits 

xo-  e.P"l(rc)  +  ? 

(37) 

which  specify  the  chosen  division  of  the  space  of  the  variable  x. 

2  If  the  true  values  of  ji  and  p  are  known,  the  value  of 
X  is  merely  calculated  by  use  of  (35)»  In  the  present  case, 
however,  £he  parameters  have  to  be  replaced  by  their  estimates 
}i  and  p. 

Bqu.*  ( 35)  thus  becomes 

x2-S[(vi-N.pi(p,p))2/N.pi(  £, P)]  (38) 

For  a  fixed  set  of  class  limits  x  the  probabilites  p. 
will  no  longer  be  constant.  If,  howeve?,  the  set  r  is  1 
fixed,  then  p  will  be  constant,  while  the  class  c  limits 
xc  will  vary  rrom  sample  to  sample.  It  is  obvious  that  the 
sampling  distribution  of  X  will  depend  upon  the  estima¬ 
tion  method  chosen. 

2 

The  problem  of  finding  the  limiting  distribution  of  X  , 
when  one  or  more  of  the  parameters  are  estimated  from  the 
sample,  was  solved  by  R. A. Fisher  [3]  for  a  specific  method  of 
estimation,  vis.,  the  maximum  likelihood  method.  He  found  that 
it  is  only  necessary  to  reduce  the  number  of  degrees  of  freedom 
of  the  limiting  ^^-distribution  by  one  unit  for  each  parameter 
estimated  from  the  sample.  This  simple  and  attraotive  rule  is 
not  valid,  if  other  estimators  are  used. 

Two  alternative  methods  will  be  examined*  the  best  linear 
and  the  pseudo- standard! sat ion  methods. 
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4.1  Best  linear  estimation  of  the  parameters 


2 

The  distributions  of  X  ^  are  computed  by  use  of  Program 
23/71.  Prom  each  random  T  sample  of  size  N  the  parame¬ 
ters  are  estimated  by  use  of  the  following  formulas 

P  “  ^  °i  xi 

^  S 

The  coefficients  c.  ,  d  are  given  by  Sarhan  &  Greenberg 
[5]  for  the  normal  distributions  and  N*2(l)20  and  for  the 
Weibull  distributions  by  Weibull  [6]  for  N  = 5(5)20  and 
a«0.05,  0.1,  0.3,  0.5,  0.7,  1.0,  1.5)  2.0 

Introducing  these  estimates  into  the  formula 

xc  =  p  .  F"1  (o/r)  +  £  (c  *  0,1,2,  ,r) 


(39) 


(40) 


the  class  limits  and  corresponding  class  frequencies  are 

obtained. 

2 

It  is  convenient  to  use  instead  of  X  the  statistic 

K  -  N  .  X*^/2r  “  Evi2/2  "  *2/2t  (41) 

because  K  is  a  non-negative  integer  and  infK»0 


Some  distributions 


Q(x)  =  Prob(K  >  x)  (42) 

for  the  normal  distribution  are  presented  in  Table  5. 

A  test  of  normality  is  easily  performed  by  use  of  these 
tables.  Prom  the  given  sample  under  examination  the  value  of 
K  is  computed  and  the  probability  of  having  a  larger  value  is 
read  from  the  table.  If  this  probability  is  too  small,  the 
hypothesis  of  normality  is  rejected.  The  same  test  can  be  used 
for  other  hypothetical  distributions,  if  the  necessary  tables 
have  been  prepared. 
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4.2  Pseudo-standardization  of  the  samples 


A  A 

If  the  estimates  ji  and  P  are  replaoed  by 
(r^.-x^),  respectively,  the  pseudo-standardized 
able 


and 

vari- 


Xi  m  ^xi”xl^air"xl^  (43) 

is  obtained.  Since  t.  ■  0  and  t^-1,  the  size  of  the 
transformed  sample  is  equal  to  (if- 2). 

For  properly  chosen  class  limits  t  ,  the  modified  test 
statistio 

K  -  Zv.2/2  -  (N-  2)2/2r  (44) 

can  be  computed  by  use  of  Program  24/71. 

In  Table  6  the  results  for  N  -  10,  r  -  2,  4*  8  and  for 
normal  and  exponential  dbns  are  presented. 

These  tables  can  be  used  for  testing  normality  and  expo¬ 
nent  iality. 


2 

4.3  The  statistic  X^/^.  as  a  shape  operator 
2 

When  and  K  are  computed  from  samples  which  have 

location  and  scale  invariance,  as  the  two  above-mentioned  al¬ 
ternatives,  both  of  them  can  be  used  as  shape  operators.  For 
example,  from  the  distributions  in  Table  6  it  is  easy  to  com¬ 
pute  the  power  of  deciding  between  normal  and  exponential  dis¬ 
tributions. 

The  result  for  If  ■  10  is 

r  «=  2  4  8 

DP  -  35.5  41.9  39*0$ 

This  decision  power  is  not  very  good,  to  some  extent  due 
to  the  fact  that  is  independent  of  permutations  in  the 

observed  values  of  v^.  For  instance,  for  r=2,  IT  -  10,  the 
set  v  -  0  f  v  =  10  1  yields  the  same  value  of  X2  /,rt  as 
v1-10ij  v2-0.Z  2/10 
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A  substantial  improvement  can  be  obtained  by  separating 
the  probabilities  of 


vx,  v2  -  0,  10  and  10,  0 

1»  9  9,  1 

2,  8  8,  2 

etc. 


By  doing  so  the  value  above  of  DP  >  35.5$  was  raised  to 
DP*  55*o$.  In  some  other  cases,  the  improvement  was  still  bet¬ 
ter.  These  results  have  motivated  the  introduction  of  a  new 
test  operator,  denoted  by  VI.  Its  properties  and  usefulness 
will  be  demonstrated  in  a  following  Scientific  Report. 
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Table 

n? 


I  -  The  exact  distribution  F(w) 

of  the  random  variable  w»  (y-  V^" 

fj  “““““  O 


and  the  normal  distribution 
for  v  -  1  and 


various  sample  sizes  N 


F(w) 

/(*) 

V 

w 

N-  5 

10 

20 

50 

100 

1,000 

10,000 

- 

0 

-1 

32.768 

34.868 

35.848 

36.417 

36.603 

36.770 

36.786 

15.866 

i 

0 

73.728 

73.610 

73.583 

73.557 

73.576 

73.576 

73.576 

50.000 

2 

l 

94.208 

92.981 

92.451 

92.157 

92.063 

91.979 

91.972 

84.134 

3 

2 

99.328 

98.720 

98.409 

98.224 

98.163 

98.107 

98.103 

97.725 

4 

3 

99-968 

99-836 

99.742 

99.679 

99.657 

99.636 

99.635 

99.865 

5 

4 

100.000 

99.985 

99.967 

99.953 

99.947 

99.941 

99.941 

99.997 

6 

5 

- 

99.999 

99.997 

99.995 

99.994 

99.992 

99.992 

100.000 

7 

6 

— 

100.000 

100.000 

100.000 

100.000 

99.999 

99.999 

- 

8 

7 

- 

- 

- 

- 

- 

100.000 

100.000 

— 

E(w) 

.  0 

0 

0 

0 

0 

0 

0 

E(w) 

-  0.80 

0.90 

0.95 

0.98 

0.99 

0.999 

0.9999 
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Table  II-  The  exact  distribution  F(w)  and  the  normal  distri¬ 
bution  fl(w)  of  the  random  variable  w«  (v-v^)/  f°r 

N- 1.000  and  v^-1.4,  and  lT~ 


v 

—  o 


.  1 


V 

W 

F(w) 

/(w) 

0 

-1 

36.770 

15.866 

1 

0 

73.576 

50.000 

2 

1 

91.979 

84.134 

3 

2 

98.107 

97.725 

4 

3 

99.636 

99.865 

5 

4 

99.941 

99.997 

6 

5 

99.992 

100.000 

7 

6 

99.990 

- 

8 

7 

100.000 

— 

E(w)«0;  E(w2)- 0.999 


V 

W 

F(w) 

/(  w) 

0 

-2.0 

1.817 

2.275 

1 

-1.5 

9.114 

6.681 

2 

-1.0 

23.752 

15.866 

3 

-0.5 

43.300 

30.854 

4 

0.0 

62.884 

50.000 

5 

0.5 

78.545 

69.146 

6 

1.0 

88.975 

84.134 

7 

1.5 

94.923 

93.319 

8 

2.0 

97.888 

97.725 

9 

2.5 

99.201 

99.379 

10 

3.0 

99.723 

99.865 

11 

3.5 

99.912 

99-977 

12 

4.0 

99.974 

99.997 

13 

4.5 

99.993 

100.000 

14 

5.0 

99.998 

- 

15 

5.5 

100.000 

E(w)  -  0;  E(w2)  -  0.996 
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Table  II  (Continued) 


v 

-  o 


-  16 


V 

w 

F(w) 

/(w) 

V 

w 

F(w) 

/(w) 

0 

-  4.00 

0.000 

0.003 

19 

0.75 

81.393 

77.336 

1 

-  3.75 

0.000 

0.009 

20 

1.00 

86.996 

84.134 

2 

-  3.50 

0.001 

0.023 

21 

1.25 

91.248 

89.434 

3 

-  3.25 

0.008 

O.058 

22 

1.50 

94.324 

93.319 

4 

-  3.00 

0.037 

0.135 

23 

1.75 

96.451 

95.993 

5 

-  2.75 

0.130 

O.298 

24 

2.00 

97.859 

97.725 

6 

-  2.50 

0.380 

0.621 

25 

2.25 

98.753 

98.777 

7 

-  2.25 

0.957 

1.223 

26 

2.50 

99.298 

99.379 

8 

-  2.00 

2.122 

2.275 

27 

2.75 

99.618 

99.702 

9 

-  1.75 

4.210 

4.007 

28 

3.00 

99.799 

99.865 

10 

-  1.50 

7.575 

6.681 

29 

3.25 

99.897 

99.942 

11 

-  1.25 

12.499 

IO.566 

30 

3.50 

99.949 

99.977 

12 

-  1.00 

19.098 

15.866 

31 

3.75 

99.975 

99.991 

13 

-  0.75 

27.253 

22.664 

32 

4.00 

99.988 

99.997 

14 

-  0.50 

36.601 

30.854 

33 

4.25 

99.994 

99.999 

15 

-  0.25 

46.593 

40.130 

34 

4.50 

99.997 

100.000 

16 

0.00 

56.595 

50.000 

35 

4.75 

99.999 

17 

0.25 

66.009 

59.870 

36 

5.00 

100.000 

18 

0.50 

74.368 

69.146 

• 

— 

— 

— 

E(w) =  0$  E(w2) *  O.984 
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Table  III-  Some  exact  distributions  of  the  statistic 

for  completely  specified  hypothetical  distributions 


t2 

X2/lo 


X 

Plx) 

Qi*)  - 

0.0 

0.4 

1.6 

3.6 

6.4 

10.0 

24.609 

41.016 

23.438 

8.789 

1.953 

0-191 

24.609 

65.625 

89.062 

97.852 

99.805 

1 100.000 

75.391 

34.375 

10.938 

2.148 

0.195 

0.000 

Bex2)-!.©*  Var(X2)«1.8 


X  2/20 


X 

p(x) 

Q('ir 

0.0 

17.520 

17  •  620 

82.380 

0.2 

32.036 

49.656 

50.344 

0.8 

24.027 

73.682 

26.318 

1.8 

14.786 

88 • 468 

11.532 

3.2 

7.393 

95.861 

4.139 

5.0 

2.957 

98.818 

1.182 

7.2 

0.924 

99.742 

O.258 

9.8 

0.217 

99.960 

0.040 

12.8 

0.036 

99.996 

0.004 

16.2 

0.004 

100.000 

0.000 

20.0 

0.000 

100.000 

0.000 

E(x2)  -1.0  ;  Var(x2)  -  1.9 


p(x)  -  Prob(X2  »  x) 
P(x)  -  Prob(X2<  x) 
Q(x)  -  Prob(X  >  x) 


x2 

X3/9 


X 

p(x) 

p(*r 

0.000 

0.667 

2.000 

2.667 

4.667 
6.000 
8.000 

8.667 
12.667 
18.000 

8.535 

38.409 

21.125 

15.364 

11.529 

2.561 

1.097 

1.097 

0.274 

0.015 

8.535 

46.944 

68.069 

83.432 

94.955 

97.516 

98.613 

99.710 

99.985 

100.000 

91.465 

53.056 

31.931 

16.568 

5.045 

2.484 

1.387 

0.290 

0.015 

0.000 

EfX2)  .  2.0  (  T» r(X2) *  3.55556 
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Table  IV  -  Some  M.C. -determined  distributions  of 

completely  specified  hypothetical  distributions 


for 


2 

/* 


x2 

X2/10 


X 

p(x) 

P(x) 

Q(x) 

0.0 

24.52 

24.52 

75.48 

0.4 

40.96 

65.48 

34.52 

1.6 

23.46 

89.94 

11.06 

3.6 

9.07 

98.01 

1.99 

6.4 

1.85 

99.86 

0.14 

10.0 

0.14 

100.00 

0.00 

E(X2) 

-  0.99839 

r  -  1 

-  1.00000 

Var-(X  ) 

-  1.74553 

2(r-l)(N-l)A 

.  1.80000 

t2 

5/10 


X 

p(x) 

P(x) 

Q(x) 

0 

1.23 

1.23 

98.77 

1 

15.47 

16.70 

83.30 

2 

15.60 

32.30 

67.70 

3 

19.47 

51.77 

48.23 

4 

12.52 

64.29 

35.71 

5 

15.32 

79.61 

20.39 

6 

3.05 

82.66 

17.34 

7 

7.81 

90.47 

9.53 

8 

4.07 

94.54 

5.46 

9 

1.62 

96.16 

3.84 

10 

0.22 

96.38 

3.62 

11 

2.27 

98.65 

1.35 

12 

0.46 

99.11 

O.89 

13 

0.45 

99.56 

0.44 

15 

0.03 

99.59 

0.41 

16 

0.16 

99.75 

O.25 

17 

0.19 

99.94 

0.06 

19 

0.02 

99.96 

0.04 

23 

0.02 

99.98 

0.02 

24 

0.02 

100.00 

0.00 

E(X2) 

-  3.98033 

r-1  2 

-  4.00000 

Var (X  ; 

-  7.09714 

2(r-l)(N-l)/W 

-  7.20000 

p(x)  »  Prob(X2-x) 
P(x)  -  Prob(X2ix) 
Q(x)  -  Prob(X2>x) 
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Table  IV  (Continued) 


z 

p(x) 

r(x) 

Q(x) 

0 

0.04 

0.04 

99.96 

2 

1.60 

1.64 

98.36 

4 

11.31 

12.95 

87.05 

6 

21.41 

34.36 

65.64 

8 

22.54 

56.90 

43.10 

10 

19.32 

76.22 

23.78 

12 

8.07 

84.29 

15.71 

14 

8.84 

93.13 

6.87 

16 

3.01 

96.14 

3.86 

18 

1.59 

97.73 

2.27 

20 

1.01 

98.74 

1.26 

22 

0.68 

99.42 

O.58 

24 

0.29 

99.71 

0.29 

26 

0.13 

99-84 

0.16 

28 

0.02 

99.86 

0.14 

30 

0.06 

99.92 

0.08 

32 

0.06 

99.98 

0.02 

34 

0.01 

99.99 

0.01 

42 

0.01 

100.00 

0.00 

E(X2) 

-  8.98307 

r“  1  o 

-  9.00000 

Var(X  ) 

=  15.89416 

2(r- 

1)(N-1)/N  -  16.20000 
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Table  V  -  Some  Q-functions  of  K  for  normal  dbn  and  two 

best-linearly  estimated  parameters 


rr 

N  - 

10 

u 

O 

CVJ 

1 

K 

r-  2 

r»  5 

r-  10 

r  *  2 

r  «C  4 

r-  10 

r  -  20 

0 

58.98 

94.23 

99.40 

70.40 

96.30 

99*96 

100.00 

1 

9.36 

61.55 

94.13 

24.53 

70.66 

99.33 

100.00 

2 

44.02 

75.14 

- 

63.51 

95.65 

99.90 

3 

22.65 

52.71 

- 

42.60 

88.50 

99.19 

4 

.47 

15.20 

28.73 

5-25 

36.44 

78.00 

96.53 

5 

7.12 

13.49 

- 

27.37 

65.34 

89.92 

6 

— 

6.09 

7.93 

- 

24.70 

52.82 

78.41 

7 

— 

2.59 

3.04 

• 

14.17 

41.94 

63.47 

8 

— 

.89 

1.44 

- 

13.42 

31.78 

48.15 

9 

.00 

.78 

.76 

.60 

9.79 

23.33 

33.88 

10 

— 

.73 

.47 

- 

8.12 

17.19 

22.67 

11 

— 

.27 

.15 

- 

6.54 

12.61 

14.43 

12 

— 

.16 

.03 

- 

5.12 

8.68 

9.10 

13 

— 

.06 

.02 

- 

2.95 

6.04 

5.70 

14 

— 

— 

- 

- 

- 

4.11 

3.71 

15 

— 

- 

- 

2.05 

3.02 

2.30 

16 

— 

.05 

.00 

.01 

1.96 

2.20 

1.30 

17 

— 

.00 

— 

- 

1.38 

1.76 

.83 

18 

— 

— 

- 

- 

1.10 

1.24 

.47 

19 

— 

- 

- 

- 

.56 

.90 

.31 

20 

— 

— 

- 

- 

.43 

.58 

.16 

21 

— 

- 

- 

- 

.27 

.44 

.09 

22 

— 

- 

— 

- 

.23 

.37 

.05 

23 

— 

— 

— 

- 

.16 

.29 

.03 

24 

— 

— 

— 

- 

- 

.16 

.02 

_25 

- 

- 

- 

.00 

.07 

.11 

.01 

K  -  Et.  2/2  -  N2/2r 
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Table  VI-  Some  Q- functions  of  K  for  pseudo-standardized 

sample  from  normal  and  exponential  populations 


K 

Normal  dbn 

Exponential  dbn 

r«  2 

r  *  4 

r  -  8 

r-  2 

r  =  4 

r  -8 

0 

78.08 

97.70 

99.83 

92.98 

99.34 

99.08 

1 

39.71 

72.33 

94.72 

75.26 

91.58 

98.56 

2 

63.72 

73.38 

- 

88.37 

91.30 

3 

38.86 

51.47 

- 

75.21 

81.34 

4 

14.12 

30.57 

27.51 

49.66 

70.31 

65.05 

5 

18.73 

18.16 

- 

60.63 

57.21 

6 

_ 

16.13 

11.03 

- 

56.33 

46.74 

7 

— 

7.29 

3.92 

- 

40.73 

32.22 

8 

— 

6.15 

3.13 

- 

39.54 

30.08 

9 

2.79 

4.11 

2.19 

20.07 

35.74 

25.69 

10 

— 

1.12 

- 

- 

18.01 

11 

— 

1.98 

.30 

- 

23.50 

11.50 

12 

— 

•  56 

- 

- 

15.19 

11.18 

13 

.21 

- 

- 

10.19 

15 

•> 

— 

.06 

- 

- 

4.70 

16 

•  00 

— 

.02 

.00 

— 

3.02 

17 

— 

.07 

- 

- 

3.94 

— 

21 

— 

- 

.00 

- 

- 

.38 

24 

— 

.00 

- 

- 

.00 

— 

28 

- 

— 

— 

— 

— 

.00 

K  -  Lv.2/2-  (N-  2)2^2t 


r  -  2  f 

-  0.50000 

r  =  4  | 

*0 

=  0.30500  0.50000  0.69500 

r  •  8  | 

t 

-  0.18307,  0.30500,  0.40663,  0 

c 

0.59337,  0.69500,  0.81693 

50000 
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